←︎ Back Business Forecasting › Class Slides
1 / 19

Time Series Features

Lecture 4

What is a time series “feature,” and why are features useful?

A feature is a single number that summarizes one property of a time series.
Rather than staring at 500 individual time plots, features let us compare many series at once by reducing each to a small set of descriptive statistics.
Examples: the mean, the variance, the lag-1 autocorrelation, the strength of trend, the strength of seasonality.
In fpp3, features(data, variable, feature_functions) computes a table of features for one or many series simultaneously. This is the basis for large-scale automated forecasting.

Simple statistical features

Location: mean, median, quantiles.
  • The mean is the most common summary, but the median is more robust to outliers.
  • Quantiles describe the distribution of values across time.
Spread: variance, standard deviation, IQR.
  • A series with high variance relative to its mean is harder to forecast precisely.
Shape: skewness, kurtosis.
  • Positive skew means occasional large upward spikes (e.g., storm damage claims).
  • High kurtosis means fat tails — extreme values occur more often than a normal distribution would predict.
In R: features(data, variable, list(mean, var, quantile)).
  • Or use the pre-built feat_acf, feat_stl shortcuts.
ACF-based features quantify the autocorrelation structure of a series.
acf1
Lag-1 autocorrelation. High values indicate strong short-run persistence.
acf10
Sum of squares of the first 10 autocorrelations. Captures overall autocorrelation magnitude.
diff1_acf1
Lag-1 ACF of the first-differenced series. Useful for detecting over-differencing.
season_acf1
ACF at the seasonal lag. A large value confirms strong seasonality.
In fpp3: features(data, variable, feat_acf).

How do we measure the strength of trend and seasonality in a single number?

STL decomposition yields two powerful features: trend strength and seasonal strength.
Trend strength compares the variance of the remainder to the variance of the deseasonalized series:
FT = max (0,  1 − Var(Rt) / Var(Tt + Rt))
Seasonal strength compares the variance of the remainder to the variance of the seasonally adjusted series:
FS = max (0,  1 − Var(Rt) / Var(St + Rt))
Both measures lie in [0, 1]. Values near 1 mean nearly all variation is explained by that component; near 0 means it is absent. In fpp3: feat_stl.

Other useful features

Spectral entropy — measures forecastability.
  • Based on the spectral density of the series. Values near 0 = highly forecastable (strong patterns). Values near 1 = close to white noise (hard to forecast).
  • Useful for ranking a large collection of series by difficulty.
Number of peaks and troughs in the ACF.
  • Captures whether the ACF oscillates (seasonal or cyclical) or simply decays.
Unit root test statistics (KPSS, ADF).
  • Indicate whether the series is stationary or needs differencing.
  • Used automatically by unitroot_ndiffs() to choose the order of differencing.

What does it mean for a time series to be stationary?

A stationary series has statistical properties that do not change over time.
Formally, a series is (weakly) stationary if:
  • The mean is constant: E[yt] = μ for all t.
  • The variance is constant: Var(yt) = σ² for all t.
  • The autocovariance between yt and yt−k depends only on the lag k, not on t.
Why it matters: most time series models (ARIMA, in particular) require stationarity. A trended or heteroskedastic series must be transformed before fitting these models.
Common fix: first differencing removes a trend. Taking logs before differencing handles growing variance.

Differencing to achieve stationarity

First difference: Δyt = yt − yt−1
  • Removes a linear trend. A series that requires one difference is called I(1) (integrated of order 1).
  • Most macroeconomic series are I(1): GDP, employment, prices.
Second difference: Δ²yt = Δyt − Δyt−1
  • Removes a quadratic trend. Rarely needed beyond two differences.
Seasonal difference: Δmyt = yt − yt−m
  • Removes a stable seasonal pattern by subtracting the value from one full season ago.
  • Often combined with a regular first difference: difference seasonally, then difference again.
Unit root tests formally test whether differencing is needed.
KPSS test (Kwiatkowski–Phillips–Schmidt–Shin): tests the null that the series is stationary. A small p-value rejects stationarity → difference the series.
ADF test (Augmented Dickey–Fuller): tests the null that the series has a unit root (is non-stationary). A small p-value rejects the unit root → no differencing needed.
In fpp3, unitroot_ndiffs() combines these tests to automatically select the number of regular differences needed, and unitroot_nsdiffs() selects the number of seasonal differences.
Note: always confirm the automatic suggestion with a time plot and ACF. Automated tests can fail on short or irregular series.
A scatterplot matrix reveals relationships among multiple time series at once.
When you have several related variables (e.g., sales across regions, or GDP, inflation, and unemployment), a scatterplot matrix plots each pair of variables against each other in a grid.
What to look for:
  • Linear vs. non-linear relationships between variables.
  • Positive or negative co-movement (potential explanatory power for regression).
  • Outliers that appear in some pairs but not others.
  • Near-identical series (possible multicollinearity if used as predictors).
In R: GGally::ggpairs() or a custom loop with ggplot2.

Using features to analyze many series at once

Companies often need to forecast thousands of series simultaneously.
  • A retailer may forecast sales for 50,000 SKUs across 300 stores.
  • Manually inspecting each series is impossible. Features make it tractable.
PCA on features reduces to two dimensions for visualization.
  • Each series becomes a point in feature space. Clusters of similar series often share the same best model.
  • Outlier series (unusual feature values) can be flagged for manual review.
Features also support model selection at scale.
  • Series with high trend strength and low seasonal strength → use an ETS(A,A,N) or ARIMA with drift.
  • Series with both high trend and high seasonal strength → use ETS(A,A,A) or seasonal ARIMA.
The features() function is the workhorse for feature-based analysis.
Common usage patterns:
# All fpp3 features in one call
data |> features(variable, feature_set(pkgs = "feasts"))
# ACF features only
data |> features(variable, feat_acf)
# STL features: trend + seasonal strength
data |> features(variable, feat_stl)
The result is a tibble with one row per series and one column per feature — ready for plotting, clustering, or model selection logic.
High autocorrelation can be misleading when the series has a trend or seasonality.
A strongly trended series will show high positive autocorrelation at all lags — not because the past truly predicts the future, but because both yt and yt−k are near the same part of the trend.
This is called spurious autocorrelation. It inflates ACF values and can make a series look much more forecastable than it really is.
Fix: always assess the ACF of a stationary series. Difference (or detrend and deseasonalize) before computing the ACF if the series is non-stationary.
The ACF of first-differenced data answers the real question: does today’s change predict tomorrow’s change?

Chapter 4 in summary

Features compress a time series into interpretable scalars.
  • Simple stats, ACF features, STL features, spectral entropy, unit root test statistics.
Trend strength and seasonal strength (from STL) are the most practically useful.
  • They guide model choice and flag series that may need special treatment.
Stationarity is a prerequisite for ARIMA modelling.
  • Use unit root tests and ACF inspection to decide how many differences are needed.
At scale, features enable automated, data-driven forecasting pipelines.
  • PCA and clustering on features reveal structure invisible in individual time plots.
Practice Questions
Question 1 of 4

Key Terms